Space-Efficient String Indexing for Wildcard Pattern Matching

نویسندگان

  • Moshe Lewenstein
  • Yakov Nekrich
  • Jeffrey Scott Vitter
چکیده

In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses O(n log n) bits for any ε > 0 and reports all occ occurrences of a wildcard string in O(m + σ · μ(n) + occ) time, where μ(n) = o(log log logn), σ is the alphabet size, m is the number of alphabet symbols and g is the number of wildcard symbols in the query string. We also present an O(n)bit index with O((m + σ + occ) log n) query time and an O(n(log logn)2)-bit index with O((m+ σ + occ) log logn) query time. These are the first non-trivial data structures for this problem that need o(n logn) bits of space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Randomization in Parallel Stringology

In this abstract, we provide an overview of our survey of randomized techniques for exploiting the parallelism in string matching problems. Broadly, the study of string matching falls into two categories: standard stringology and nonstandard stringology. Standard Stringology concerns the study of various exact matching problems. The fundamental problem here is the basic string matching problem ...

متن کامل

Cross-Document Pattern Matching

We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time ...

متن کامل

Succincter Text Indexing with Wildcards

We study the problem of indexing text with wildcard positions, motivated by the challenge of aligning sequencing data to large genomes that contain millions of single nucleotide polymorphisms (SNPs)—positions known to differ between individuals. SNPs modeled as wildcards can lead to more informed and biologically relevant alignments. We improve the space complexity of previous approaches by giv...

متن کامل

SWiM: Secure Wildcard Pattern Matching From OT Extension

Suppose a server holds a long text string and a receiver holds a short pattern string. Secure pattern matching allows the receiver to learn the locations in the long text where the pattern appears, while leaking nothing else to either party besides the length of their inputs. In this work we consider secure wildcard pattern matching (WPM), where the receiver’s pattern is allowed to contain wild...

متن کامل

Simple deterministic wildcard matching

We present a simple and fast deterministic solution to the string matching with don’t cares problem. The task is to determine all positions in a text where a pattern occurs, allowing both pattern and text to contain single character wildcards. Our algorithm takes O(n logm) time for a text of length n and a pattern of length m and in our view the algorithm is conceptually simpler than previous a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014